-
Notifications
You must be signed in to change notification settings - Fork 11
Provide a better error for client-side shard filtering. #23
Conversation
This yields a reasonable number of tasks for sharded jobs running over the whole genome.
Now that we are doing client-side filtering for strict shard boundaries, we need to check that we are requesting the field that the filter will require and return a useful error if not.
Nicole, I think it's ready to merge. I want to give it a try through Dataflow. Thanks, |
@calbach ptal when you have time? And let me know if you have an alternate suggestion for this check. |
Nicole, it's good enough. Just mention it in the Readme that you are being restrictive to generate uniqueness in the shards. Until we find a better solution - which would require some more deep thinking - let's just release it. Just provide something reasonable as a reason so users understand why. Paul |
@Override | ||
public void initialize(GenomicsRequest<?> search) { | ||
if (null != fields) { | ||
for (String requiredField : REQUIRED_STRICT_SHARD_FIELDS.keySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this could be extracted to a common utility which accepts the fields string and the map of patterns to be matched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored!
Provide a better error for client-side shard filtering.
Fantabulous - I love it! Thanks guys! Have a great weekend! |
Now that we are doing client-side filtering for strict shard boundaries, we need to check that we are requesting the field that the filter will require and return a useful error if not.
Also increased the default number of bases per shard. This yields a reasonable number of tasks for sharded jobs running over the whole human genome.